Perceptual audio features for emotion detection
نویسندگان
چکیده
منابع مشابه
Perceptual audio features for emotion detection
In this article, we propose a new set of acoustic features for automatic emotion recognition from audio. The features are based on the perceptual quality metrics that are given in perceptual evaluation of audio quality known as ITU BS.1387 recommendation. Starting from the outer and middle ear models of the auditory system, we base our features on the masked perceptual loudness which defines re...
متن کاملExtracting GFCC Features for Emotion Recognition from Audio Speech Signals
A major challenge for automatic speech recognition (ASR) relates to significant performance reduction in noisy environments. This paper presents our implementation of the Gammatone frequency cepstral coefficients (GFCCs) filter-based feature along with BPNN and the experimental results on English speech data. By some thorough designs, we obtained significant performance gains with the new featu...
متن کاملExtracting MFCC Features For Emotion Recognition From Audio Speech Signals
A major challenge for automatic speech recognition (ASR) relates to significant performance reduction in noisy environments. Recent research has shown that auditory features based on Gammatone filters are promising to improve robustness of ASR systems against noise, though the research is far from extensive and generalizability of the new features is unknown. This paper presents our implementat...
متن کاملGroup delay features for emotion detection
This paper focuses on speech based emotion classification utilizing acoustic data. The most commonly used acoustic features are pitch and energy, along with prosodic information like the rate of speech. We propose the use of a novel feature based on the phase response of an all-pole model of the vocal tract obtained from linear predictive coefficients (LPC), in addition to the aforementioned fe...
متن کاملPerceptual Dimensions of Short Audio Clips and Corresponding Timbre Features
This study applied a multi-dimensional scaling approach to isolating a number of perceptual dimensions from a dataset of human similarity judgements for 800ms excerpts of recorded popular music. These dimensions were mapped onto the 12 timbral coefficients from the Echo Nest’s Analyzer. Two dimensions were identified by distinct coefficients, however a third dimension could not be mapped and ma...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: EURASIP Journal on Audio, Speech, and Music Processing
سال: 2012
ISSN: 1687-4722
DOI: 10.1186/1687-4722-2012-16